A comparative investigation of methods for logistic regression with separated or nearly separated data.

نویسنده

  • Georg Heinze
چکیده

In logistic regression analysis of small or sparse data sets, results obtained by classical maximum likelihood methods cannot be generally trusted. In such analyses it may even happen that the likelihood meets the convergence criteria while at least one parameter estimate diverges to +/-infinity. This situation has been termed 'separation', and it typically occurs whenever no events are observed in one of the two groups defined by a dichotomous covariate. More generally, separation is caused by a linear combination of continuous or dichotomous covariates that perfectly separates events from non-events. Separation implies infinite or zero maximum likelihood estimates of odds ratios, which are usually considered unrealistic. I provide some examples of separation and near-separation in clinical data sets and discuss some options to analyse such data, including exact logistic regression analysis and a penalized likelihood approach. Both methods supply finite point estimates in case of separation. Profile penalized likelihood confidence intervals for parameters show excellent behaviour in terms of coverage probability and provide higher power than exact confidence intervals. General advantages of the penalized likelihood approach are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Sero-Prevalence Investigation of Helicobacter Pylori Infection in Beta Thalassemia Major Patients, Referred to Taleghani Center, Gorgan, Iran

Abstract Background and objectives: Recurrent Abdominal Pain (RAP) syndrome is a common disorder, especially in children with beta thalassemia major. These patients are predisposed to heart diseases which are caused by hemochromatosis (Iron overload), resulting in sudden death. Because of the role of Helico bacter pylori in causing abdominal pain and peptic ulcer, and in increasing the risk or ...

متن کامل

An Artificial Neural Network Model for Predicting the Pressure Gradient in Horizontal Oil–Water Separated Flow

In this study, a three–layer artificial neural network (ANN) model was developed to predict the pressure gradient in horizontal liquid–liquid separated flow. A total of 455 data points were collected from 13 data sources to develop the ANN model. Superficial velocities, viscosity ratio and density ratio of oil to water, and roughness and inner diameter of pipe were used as input parameters of ...

متن کامل

An Artificial Neural Network Model for Predicting the Pressure Gradient in Horizontal Oil–Water Separated Flow

In this study, a three–layer artificial neural network (ANN) model was developed to predict the pressure gradient in horizontal liquid–liquid separated flow. A total of 455 data points were collected from 13 data sources to develop the ANN model. Superficial velocities, viscosity ratio and density ratio of oil to water, and roughness and inner diameter of pipe were used as input parameters of ...

متن کامل

An Improved Methodology for Measurement of Uninterrupted – Flow Capacity Affected by Pavement Condition

The present Serviceability Rating (PSR) is one of the major criteria in selecting road for rehabilitation. In this paper, statistically realistic models for are developed PSR and free speed correlation on uninterrupted flow facilities. Then, the previously developed relations between free speed and capacity are used to determine the effect of pavement condition on flow capacity. Two nearly iden...

متن کامل

به‌کارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر هم‌خطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان

Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistics in medicine

دوره 25 24  شماره 

صفحات  -

تاریخ انتشار 2006